Jeff Bier’s Impulse Response—When Scaling Gets Squirrelly

Submitted by Jeff Bier on Mon, 12/12/2005 - 17:00

It’s so tempting. You need to figure out how much processing power you’ll require to implement a particular video compression algorithm, and there, out on the Web, is the data you need—but for a slightly different scenario. Perhaps the data is for a smaller frame size than what you have in mind. Or maybe it’s for a low compressed bit rate, and your application will be using a higher one.


You tell yourself, “Well, I don’t have the exact data I need, so I’ll just multiply the data I do have by a scaling factor that will account for the differences in the workloads.” But that’s where things can go badly wrong.

Let’s say the data you have is for a small frame size. Your frame size is (for example) four times as big, so you are sorely tempted to multiply the processor loading data by a factor of four. But look out! It’s entirely possible that the smaller frame fits in on-chip memory, and the bigger frame doesn’t. If that’s the case, then as soon as you exceed the capacity of on-chip memory you may take a huge performance hit. At that point, the relationship between the processing power required for each of these scenarios can go nonlinear, and your simple scaling won’t mean much. In fact, because video algorithms tend to be extremely data-intensive, the size of on-chip memory and the bandwidth of off-chip memory are frequently the factors that limit performance.

Other factors further complicate matters. For example, in typical video compression algorithms (like H.264), most of the blocks operate on pixels, but one very important block doesn’t: the entropy coding block. This block operates on coefficients, not pixels, and as a result its processing load is a function of the compressed bit rate, not the pixel rate. You could just ignore this block, but that would be a mistake. The entropy coding portion of a video compression algorithm can consume as much as a third of the processing power needed for the entire algorithm. So if you double the bit rate—even while keeping the pixels/second constant—you’re going to need a lot more horsepower.

And even when you think the performance data you’ve found is an exact match for your scenario, you’re probably wrong. There are many subtle factors that can affect the processing load of a particular video algorithm, and you often won’t have enough information on these factors to assess the relevance of the data. For example, let’s say you’re planning to incorporate H.264 Baseline Profile encoding into your product, with a VGA frame size at 30 frames per second. And let’s say your processor vendor has performance data for that very scenario. That’s a good start, but until you’ve nailed down a number of additional variables, the data you’re looking at may do more to mislead than to inform. These variables include things like the quality of the video, what type of video content is being encoded, and the off-chip memory bandwidth. Without a detailed understanding of the circumstances under which the performance data was generated, you won’t be able to make a meaningful prediction of the performance you’ll achieve in your application.

It’s human nature (or engineering nature, anyway) to want to extrapolate from one set of data to another. But in the realm of video processing, you need to do so carefully, or not at all.

Add new comment

Log in to post comments